yoshua bengio
AI showing signs of self-preservation and humans should be ready to pull plug, says pioneer
Yoshua Bengio, a Canadian professor of computing, says the idea that chatbots are becoming conscious is'going to drive bad decisions'. Yoshua Bengio, a Canadian professor of computing, says the idea that chatbots are becoming conscious is'going to drive bad decisions'. A pioneer of AI has criticised calls to grant the technology rights, warning that it was showing signs of self-preservation and humans should be prepared to pull the plug if needed. Yoshua Bengio said giving legal status to cutting-edge AIs would be akin to giving citizenship to hostile extraterrestrials, amid fears that advances in the technology were far outpacing the ability to constrain them. Bengio, chair of a leading international AI safety study, said the growing perception that chatbots were becoming conscious was "going to drive bad decisions".
- North America > United States (0.17)
- Europe > Ukraine (0.07)
- Oceania > Australia (0.05)
- North America > Canada > Quebec > Montreal (0.05)
- Leisure & Entertainment > Sports (0.72)
- Government > Regional Government (0.52)
- Information Technology > Communications > Social Media (0.75)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.69)
Equilibrium Propagation Without Limits
We liberate Equilibrium Propagation (EP) from the limit of infinitesimal perturbations by establishing a finite-nudge foundation for local credit assignment. By modeling network states as Gibbs-Boltzmann distributions rather than deterministic points, we prove that the gradient of the difference in Helmholtz free energy between a nudged and free phase is exactly the difference in expected local energy derivatives. This validates the classic Contrastive Hebbian Learning update as an exact gradient estimator for arbitrary finite nudging, requiring neither infinitesimal approximations nor convexity. Furthermore, we derive a generalized EP algorithm based on the path integral of loss-energy covariances, enabling learning with strong error signals that standard infinitesimal approximations cannot support.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Middle East > Jordan (0.04)
- North America > United States > New York (0.04)
- (4 more...)
gfnx: Fast and Scalable Library for Generative Flow Networks in JAX
Tiapkin, Daniil, Agarkov, Artem, Morozov, Nikita, Maksimov, Ian, Tsyganov, Askar, Gritsaev, Timofei, Samsonov, Sergey
In this paper, we present gfnx, a fast and scalable package for training and evaluating Generative Flow Networks (GFlowNets) written in JAX. gfnx provides an extensive set of environments and metrics for benchmarking, accompanied with single-file implementations of core objectives for training GFlowNets. We include synthetic hypergrids, multiple sequence generation environments with various editing regimes and particular reward designs for molecular generation, phylogenetic tree construction, Bayesian structure learning, and sampling from the Ising model energy. Across different tasks, gfnx achieves significant wall-clock speedups compared to Pytorch-based benchmarks (such as torchgfn library) and author implementations. For example, gfnx achieves up to 55 times speedup on CPU-based sequence generation environments, and up to 80 times speedup with the GPU-based Bayesian network structure learning setup. Our package provides a diverse set of benchmarks and aims to standardize empirical evaluation and accelerate research and applications of GFlowNets. The library is available on GitHub (https://github.com/d-tiapkin/gfnx) and on pypi (https://pypi.org/project/gfnx/). Documentation is available on https://gfnx.readthedocs.io.
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.04)
- Europe > Portugal > Braga > Braga (0.04)
- (6 more...)
- North America > Canada > Quebec > Montreal (0.05)
- North America > United States > California (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- (2 more...)
- North America > Canada > Quebec > Montreal (0.04)
- North America > Canada > Ontario > Waterloo Region > Waterloo (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- Europe > France > Île-de-France > Paris > Paris (0.04)
- Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.04)
- North America > Canada > Quebec (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- (2 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Cognitive Science (1.00)
- Asia > Middle East > Jordan (0.04)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- (5 more...)
FlowRL: Matching Reward Distributions for LLM Reasoning
Zhu, Xuekai, Cheng, Daixuan, Zhang, Dinghuai, Li, Hengli, Zhang, Kaiyan, Jiang, Che, Sun, Youbang, Hua, Ermo, Zuo, Yuxin, Lv, Xingtai, Zhang, Qizheng, Chen, Lin, Shao, Fanghao, Xue, Bo, Song, Yunchong, Yang, Zhenjie, Cui, Ganqu, Ding, Ning, Gao, Jianfeng, Liu, Xiaodong, Zhou, Bowen, Mei, Hongyuan, Lin, Zhouhan
We propose FlowRL: matching the full reward distribution via flow balancing instead of maximizing rewards in large language model (LLM) reinforcement learning (RL). Recent advanced reasoning models adopt reward-maximizing methods (\eg, PPO and GRPO), which tend to over-optimize dominant reward signals while neglecting less frequent but valid reasoning paths, thus reducing diversity. In contrast, we transform scalar rewards into a normalized target distribution using a learnable partition function, and then minimize the reverse KL divergence between the policy and the target distribution. We implement this idea as a flow-balanced optimization method that promotes diverse exploration and generalizable reasoning trajectories. We conduct experiments on math and code reasoning tasks: FlowRL achieves a significant average improvement of $10.0\%$ over GRPO and $5.1\%$ over PPO on math benchmarks, and performs consistently better on code reasoning tasks. These results highlight reward distribution-matching as a key step toward efficient exploration and diverse reasoning in LLM reinforcement learning.
- North America > Canada > Quebec > Montreal (0.14)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Europe > France (0.04)
- (3 more...)
- North America > Canada > Quebec > Montreal (0.05)
- North America > United States > Tennessee > Davidson County > Nashville (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- (4 more...)
- Leisure & Entertainment > Games > Computer Games (0.46)
- Health & Medicine > Therapeutic Area > Neurology (0.46)